DataFrame

二维数据结构,即数据以行和列的表格方式排列

语法:

pandas.DataFrame(data, index, columns, dtype, copy)

参数 说明
data 各种形式的数据,ndarray、list、constants
index 索引值必须是惟一的和散列的,与数据的长度相同.如果没有索引被传递则默认为 np.arange(n)
dtype 指定数据类型,如果没有,那么将自动推断数据类型
copy 是否复制数据,默认是 False

1> 创建空的 DataFrame

创建空的 DataFrame

df = pd.DataFrame()
print(f'空的 DataFrame: \n{df}')

# 输出结果:
#  空的 DataFrame: 
#  Empty DataFrame
#  Columns: []
#  Index: []

2> 使用列表创建DataFrame

使用单个列表或嵌套列表创建DataFrame

data = [1, 2, 3, 4, 5]
df = pd.DataFrame(data)
print(f'单个列表创建 DataFrame: \n{df}')

# 输出结果:
#  单个列表创建 DataFrame: 
#     0
#  0  1
#  1  2
#  2  3
#  3  4
#  4  5

data = [['xiao meng', 20],['xiao zhi', 21],['xiao qiang', 23]]
df = pd.DataFrame(data, columns = ['name','age'], dtype = float)
print(f'嵌套列表创建 DataFrame:\n{df}')

# 输出结果:
#  嵌套列表创建 DataFrame:
#           name   age
#  0   xiao meng  20.0
#  1    xiao zhi  21.0
#  2  xiao qiang  23.0

3> 用字典创建 DataFrame

data = {'Name':['xiao meng','xiao zhi','xiao qiang','xiao wang'],'Age':[20, 21, 23, 22]}
df = pd.DataFrame(data,index = ['rank1', 'rank2', 'rank3', 'rank4'])
print(f'用字典创建 DataFram:\n{df}')

# 输出结果:
#  用字典创建 DataFram:
#               Name  Age
#  rank1   xiao meng   20
#  rank2    xiao zhi   21
#  rank3  xiao qiang   23
#  rank4   xiao wang   22

4> 用字典列表创建 DataFrame

data = [{'a':1,'b':3},{'a':4,'b':10,'c':8}]
df_1 = pd.DataFrame(data, index = ['first','second'],columns = ['a','b'])
print(df_1)

# 输出结果:
#          a   b
#  first   1   3
#  second  4  10

df_2 = pd.DataFrame(data, index = ['first','second'],columns = ['a','b1'])
print(df_2)

# 输出结果:
#          a  b1
#  first   1 NaN
#  second  4 NaN

5> 使用系列的字典创建 DataFrame

dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d'])}
df = pd.DataFrame(dict_v)
print(df)

# 输出结果:
#     one  two
#  a  1.0    1
#  b  2.0    2
#  c  3.0    3
#  d  NaN    4

6> 列选择

dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d'])}
df = pd.DataFrame(dict_v)
print(df['one'])

# 输出结果:
#  a    1.0
#  b    2.0
#  c    3.0
#  d    NaN
#  Name: one, dtype: float64

7> 列添加

dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d'])}
df = pd.DataFrame(dict_v)
df['three'] = pd.Series([10, 20, 30],index = ['a','b','c'])
print(f'根据传递的系列添加新列:\n{df}')

# 输出结果:
#  根据传递的系列添加新列:
#     one  two  three
#  a  1.0    1   10.0
#  b  2.0    2   20.0
#  c  3.0    3   30.0
#  d  NaN    4    NaN

df['four'] = df['one'] + df['three']
print(f'使用存在的数据添加新列:\n{df}')

# 输出结果:
#  使用存在的数据添加新列:
#     one  two  three  four
#  a  1.0    1   10.0  11.0
#  b  2.0    2   20.0  22.0
#  c  3.0    3   30.0  33.0
#  d  NaN    4    NaN   NaN

8> 列删除

dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d']),'three':pd.Series([10, 20, 30],index=['a','b','c'])}
df = pd.DataFrame(dict_v)
print(f'初识DateFrame:\n{df}')

# 输出结果:
#  初识DateFrame:
#     one  two  three
#  a  1.0    1   10.0
#  b  2.0    2   20.0
#  c  3.0    3   30.0
#  d  NaN    4    NaN

del df['one']
print(f'使用删除函数删除第一列:\n{df}')

# 输出结果:
#  使用删除函数删除第一列:
#     two  three
#  a    1   10.0
#  b    2   20.0
#  c    3   30.0
#  d    4    NaN

df.pop('two')
print(f'使用 pop 函数删除一列:\n{df}')

# 输出结果:
#  使用 pop 函数删除一列:
#     three
#  a   10.0
#  b   20.0
#  c   30.0
#  d    NaN

9> 行选择、添加和删除

# 通过行标签选择行
dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d']),'three':pd.Series([10, 20, 30],index=['a','b','c'])}
df = pd.DataFrame(dict_v)
print( df.loc['b'])

# 输出结果:
#  one       2.0
#  two       2.0
#  three    20.0
#  Name: b, dtype: float64

# 通过将整数位置传递给 iloc()函数选择行
print(df.iloc[2])

# 输出结果:
#  one       3.0
#  two       3.0
#  three    30.0
#  Name: c, dtype: float64

10> 行切片

df1 = pd.DataFrame([[1, 2],[3, 4]], columns = ['a', 'b'])
df2 = pd.DataFrame([[5, 6],[7, 8]], columns = ['a', 'b'])
df = df1.append(df2)
print(f'原始数据:\n{df}')

# 输出结果:
#  原始数据:
#     a  b
#  0  1  2
#  1  3  4
#  0  5  6
#  1  7  8

df = df.drop(0)
print(f'删除 行标签后:\n{df}')

# 输出结果:
#  删除 行标签后:
#     a  b
#  1  3  4
#  1  7  8